Big Data From Scientific Simulations

نویسندگان

  • John Edwards
  • Sidharth Kumar
  • Valerio Pascucci
چکیده

Scientific simulations often generate massive amounts of data used for debugging, restarts, and scientific analysis and discovery. Challenges that practitioners face using these types of big data are unique. Of primary importance is speed of writing data during a simulation, but this need for fast I/O is at odds with other priorities, such as data access time for visualization and analysis, efficient storage, and portability across a variety of supercomputer topologies, configurations, file systems, and storage devices. The computational power of high-performance computing systems continues to increase according to Moore’s law, but the same is not true for I/O subsystems, creating a performance gap between computation and I/O. This chapter explores these issues, as well as possible optimization strategies, the use of in situ analytics, and a case study using the PIDX I/O library in a typical simulation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Big Data – The New Science of Complexity

Data-intensive techniques, now widely referred to as ‘big data’, allow for novel ways to address complexity in science. I assess their impact on the scientific method. First, big-data science is distinguished from other scientific uses of information technologies, in particular from computer simulations. Then, I sketch the complex and contextual nature of the laws established by data-intensive ...

متن کامل

The Need for Resilience Research in Workflows of Big Compute and Big Data Scientific Applications

Projections and reports about exascale failure modes conclude that we need to protect numerical simulations and data analytics from an increasing risk of hardware and software failures and silent data corruptions (SDC) [1, 4]. At this scale, hardware and software failures could be as frequent as ten or more per day. According to [9], the semiconductor industry will have increased difficulty pre...

متن کامل

HPC and Big Data Convergence for Extreme Heterogeneous Systems

As the data deluge grows ever greater, large-scale data analytics workloads are quickly becoming critical computational tools within the scientific community. Recently, convergence efforts have focused on combining aspects HPC and ”big data” analytics workloads together using a unified supercomputing system. This has the opportunity to bring advanced analytical tools to scientists which enable ...

متن کامل

Orchestrating Science DMZs for Big Data Acceleration: Challenges and Approaches

In recent years, most scientific research in both academia and industry has become increasingly data-driven. According to market estimates, spending related to supporting scientific dataintensive research is expected to increase to $5.8 billion by 2018 [1]. Particularly for dataintensive scientific fields such as bioscience, or particle physics within academic environments, data storage/process...

متن کامل

Toward A Unified HPC and Big Data Runtime

The landscape of high performance computing (HPC) has radically changed over the past decade as the community has well surpassed Petascale performance and aims for Exascale. In this effort, chip fabrication and hardware architects have been directly challenged by the fundamentals of physics of chip manufacturing. The effects of these challenges have extended beyond the underlying hardware requi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014